Detecting fuzzy community structures in complex networks with a Potts model 
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A fast community detection algorithm based on a q-state Potts model is presented. Communities 
in networks (groups of densely interconnected nodes that are only loosely connected to the rest of 
the network) are found to coincide with the domains of equal spin value in the minima of a modified 
Potts spin glass Hamiltonian. Comparing global and local minima of the Hamiltonian allows for 
the detection of overlapping ("fuzzy") communities and quantifying the association of nodes to 
multiple communities as well as the robustness of a community. No prior knowledge of the number 
of communities has to be assumed. 
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Finding groups of alike elements in data is of great in- 
terest in all quantitative sciences. For multivariate data, 
where the objects are characterized by a vector of at- 
tributes, a number of efficient and well understood clus- 
tering algorithms exist . They allow to find clusters of 
similar objects based on a metric between the attribute 
vectors. If, however, the data is of relational form as, 
e.g., a network or graph G(V,E) consisting of a set V 
of TV nodes and a set E of M links or edges connecting 
them and representing some relation between the nodes, 
the problem of finding alike elements corresponds to dis- 
covering communities: sets of nodes interconnected more 
densely among themselves than with the rest of the net- 
work (for a recent review see ref. jjj). For any induced 
subgraph g(v, e) of the graph G(V, E) with n nodes and 
m internal edges and m n ^ edges connecting the n nodes 
to the N — n remaining nodes of the graph, this can be 
formalized as: 
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In other words, the inner link density should be higher 
than the average link density in the network which again 
should be higher than the outer density of the commu- 
nity As a community structure we thus define a set of 
induced subgraphs g(v, e) that covers G(N, E) and that 
fulfills Q). Note that the problem of community detec- 
tion is different from that of minimal cut graph partition- 
ing, as for g(v,e) to be a community it is not necessary 
that the number of external edges is a global minimum. 
Rather, it only needs to be smaller than a certain thresh- 
old that depends on G(V, E) and the size of g(v, e). We 
see from (QJ that the presence of communities is bound 
to the presence of inhomogeneities in the link distribu- 
tion of a graph. Furthermore, it is understood that a 
community structure is not defined uniquely on a net- 
work. Rather, a number of community structures differ- 
ent in size and number of communities may exists that 
all fulfill the inequalities JJ). Certain nodes may belong 
to the same community in one realization and may be 



assigned to a different community in another realization. 
The differences and similarities of these realizations yield 
valuable information about the robustness of a particu- 
lar community structure. Furthermore, the nodes which 
can be assigned into more than one community represent 
an overlap of possible community structures that cannot 
be interpreted as a hierarchy of communities, since the 
overlap may only be partial. Here, we introduce a new 
algorithm that can rapidly detect a community structure 
and allows for a quantitative assessment of the individual 
realizations. 

In this paper, we combine the early idea by Fu and 
Anderson for graph bi-partitioning with a modified Ising 
Hamiltonian [3] and the recent Potts model clustering of 
multivariate data by Blatt et al. 0. This will allow us to 
map the communities of a network onto the magnetic do- 
mains in the ground state or in local minima of a suitable 
Hamiltonian. For this purpose we alter a q-state Potts 
Hamiltonian by adding a global constraint that forces the 
spins into communities according to l(Tjl: 
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Here, — 1...N denote the individual spins which 

are allowed to take q values l...q, n s denotes the num- 
ber of spins that have spin s such that X)s=i Us = 
J is the ferromagnetic interaction strength, 7 is a posi- 
tive parameter, and S is the Kronecker delta. The first 
sum is the standard ferromagnetic Potts term for nodes 
connected by an edge in the network, and is minimized 
by Wf eiT = —JM. It favors a homogeneous distribu- 
tion of spins over the network. Diversity, on the other 
hand, is introduced by the second term which sums up 
all possible pairs of spins which have equal value. It 
counter-balances the first sum and increases the energy 
with increasing homogeneity of the spin configuration. It 
represents a global anti-ferromagnetic interaction being 
maximal when all nodes have the same spin, and minimal 
when all possible spin values are evenly distributed over 
all nodes. 
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The choice of 7 determines how strongly the minimum 
of the combined Hamiltonian depends on the topology 
of the network. Consider a network of two communi- 
ties <7i(ni, mi) and g 2 (n 2 , m 2 ) with ra 12 edges connecting 
them. For the ground state to be composed of these two 
communities, 7* has to obey a simple condition 
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Comparing with Q we see that, apart from the ferro- 
magnetic coupling J, 7* is just the outer link density of 
community 31(711,7711). Thus, with the parameter 7 we 
enforce a ground state of the system such that all groups 
of nodes with equal spin have a an outer link density 
smaller than 7. Setting J = 1 and 7 to be the aver- 
age connection probability of the network p — j^r^zf) 
(or 7* = p{Jij) for weighted networks), we thus satisfy 
the second inequality in l|T]l. The first inequality in Q 
is satisfied implicitly, because high inner link densities 
are energetically favored by the Hamiltonian. Different 
local minima of the Hamiltonian then correspond to dif- 
ferent possible assignments of community structures. It 
is instructive to write the Hamiltonian J2J) in the form 
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with Jij as the (weighted) adjacency matrix of the graph. 
The ground state structure of this spin glass Hamiltonian 
corresponds to the community structure of the network. 
Fortunately, finding the ground state is difficult only for 
random networks which usually do not exhibit any clear 
community structure, and where the ambiguous commu- 
nity assignment corresponds to a typical spin glass situa- 
tion of multiple local energy minima. Relevant examples 
of networks with non-random community structure, how- 
ever, usually correspond to Hamiltonians with prominent 
ground states in large basins of attraction which makes 
our approach particularly practicable. 

The number of possible communities q is not a critical 
parameter in the algorithm: it only needs to be chosen 
large enough to accommodate for all possible communi- 
ties. If the number of communities is smaller than q, the 
remaining spin states will not be populated. However, 
since the runtime of the algorithm is linear in q, a rea- 
sonable value should be chosen (q < 100 was sufficient in 
our case). 



It remains to define a measure of the statistical sig- 
nificance of the communities found. Given the number 
of nodes in a community n, the number of inner links 
I in, and the number of outer links l ou t we can calcu- 
late the expected number of possible equivalent commu- 
nities E(n, li n , l ou t) in a random network of the same size 
(N,M) and connection probability p 
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If E(n, li n , lout) is larger than 1, we can expect to find 
such a community in a random network of the same size, 
marking the border of statistical significance. 

To practically find or approximate the ground state of 
our system we employ a simple Monte-Carlo heat-bath 
algorithm with simulated annealing :5|. Starting from 
a temperature with an acceptance ratio of > 95%, the 
system is subsequently cooled down using a decrement 
function for the temperature of the form Xfc+i = aTk 
with a — 0.99 or similar values for the k th step, until 
it reaches a configuration where no more than a given 
number of spin flips are accepted during a certain num- 
ber of sweeps over the network. In one such run, one 
reaches the ground state or another low lying local min- 
imum that corresponds to a community structure of the 
network. With a set of several runs, we are able to eval- 
uate the robustness of the community classification by 
sampling the local minima of the energy landscape of the 
Hamiltonian. The number of co-appearances of nodes in 
one community are then binned in an N x N matrix. We 
then order the rows and columns of this matrix accord- 
ing to the assignment of communities from a single simu- 
lated annealing run. Well defined community structures 
then appear as blocks of high co-appearance along the 
diagonal. Off-diagonal instances of high co-appearance 
indicate a possible overlap between clusters. 

Let us first check our algorithm by applying it to 
a number of computer-generated random test networks 
with known community structure as suggested in |(J. 
Nodes are assigned to communities and are randomly 
connected to members of the same community by an av- 
erage of (kin) and to members of different communities 
by an average of (k out ) links. Fixing the average degree 
of all nodes to (k) = (k m ) + (k out ) — 16, it becomes 
more and more difficult for any algorithm to detect the 
communities as (km) decreases on the expense of (k ou t)- 
Sensitivity and specificity are benchmarked over all possi- 
ble pairs of nodes. As true positive (negative) we count a 
pair of nodes that is in the same (a different) community 
by design and is classified accordingly by the algorithm. 
We tested two sets of networks. The first is composed of 
four equally sized communities of 32 nodes each and the 
second is composed of four communities of 128, 96, 64 
and 32 nodes respectively. Performance of our algorithm 
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FIG. 1: Benchmark of the algorithm for networks with known 
community structure and comparison with Girvan and New- 
man. Top row: 4 communities of 32 nodes each, bottom row: 
4 communities of 128, 96, 64 and 32 nodes respectively. Sym- 
bol size corresponds to error bars. 



and, for comparison, the one by Girvan and Newman 
(GN) |3 is shown in Figure ^ Note the high sensitivity 
and specificity of our algorithm for both types of net- 
works. When running our algorithm without simulated 
annealing, but simply relaxing the system at tempera- 
ture zero from a random initial condition it is extremely 
fast, yet still performs as good as the GN method. 

Figurc|2]shows the dependence of the sensitivity of the 
algorithm on q in the case of the test network with equally 
sized communities for four different values of (ki n ). Note 
that results do not depend on q. For the same type of 
test networks, Figure |21 also shows the robustness of the 
sensitivity with respect to the choice of 7. The better 
the communities are defined (the larger (fcj n )), the more 
robust are the results. The maxima of the curves for 
all values of (ki n ), however, coincide at 7 = p ~ 0.125 
which again justifies this choice of parameter. The same 
statements apply to the specificity. 

One real world example with known community struc- 
ture is the College Football network from ref. • It rep- 
resents the game schedule of the 2000 season of Division 
I of the US college football league. The nodes in the 
network represent the 115 teams, while the links repre- 
sent 613 different games played in the course of the year. 
The community structure of this network arises from the 
grouping into conferences of 8-12 teams each. On aver- 
age, each team has 7 matches with members of its own 
conference and another 4 matches with members of dif- 
ferent conferences. We perform a parameter variation in 
7 at ten values between O.lp < 7 < p. At each value of 7 
we relax the system 50 times from a randomly assigned 
initial configuration at T — using q — 50. Figure |3| 
shows the resulting 115 x 115 co-appearance matrix, nor- 
malized and color coded. The ordering of the matrix cor- 
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FIG. 2: Robustness of results for the test network with 4 
communities of 32 nodes each. Left: Sensitivity vs. q at the 
end of a Monte-Carlo optimization at different values of (ki n ). 
Averaged over 50 graphs. Right: Sensitivity for q = 25 as a 
function of 7 for different values of (ki n ). All results averaged 
over 10 graphs. 
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FIG. 3: Co-appearance matrix for the football network. 
O.lp < 7 < p, matrix ordering taken from assignment of teams 
into conferences according to game schedule. 



responds to the assignment of the teams into conferences 
according to the game schedule. The dashed blue lines 
indicate this. Apart from regaining almost exactly the 
known community structure, our algorithm is also able 
to detect inhomogeneities in the distribution of intra- and 
inter-conference games. For instance, we see a large over- 
lap of the Pacific Ten and Mountain West conference and 
also a possible subdivision of the Mid American confer- 
ence into two sub-conferences, one of which contains Ball 
State, Toledo, Central, Eastern, Northern and Western 
Michigan. This is due to the fact, that geographically 
close teams are more likely to play against each other as 
already pointed out in ref. ■ 

Finally we consider a large real world example with 



4 




co-appearance: 



FIG. 4: Co-appearance matrix for the reduced version of the 
protein folding network. O.lp < 7 < p, matrix ordering taken 
from a simulated annealing run of the full network. 



only partially known community structure, a large pro- 
tein folding network compiled by Rao and Caflisch ||. 
This network represents the conformation space of a 20 
amino acids peptide sampled by molecular dynamics at 
the melting temperature. 5 x 10 5 subsequent conforma- 
tional snapshots were taken at time intervals of 20ps, 
resulting in 132168 different configurations sampled and 
228972 observed transitions between two different con- 
formations. These represent a network of conformations, 
where a link indicates that two conformations follow each 
other in time. Analysis of this network yields valuable in- 
formation about the free energy landscape of the folding 
Hamiltonian without the need of projecting it onto ar- 
bitrarily chosen coordinates. Applying the algorithm to 
the complete unweighted network using q = 50 and 7 = p 
yields a largest community of 16,000 nodes, correctly 
corresponding to the folded state (FS). The statistical 
weight of the nodes in this community was found to be 
55% of the total weight which confirms the expectation of 
the folded and denatured state being equally populated 
at the melting temperature. The characteristic confor- 
mations of the denatured state, the high enthalpy, high 
entropy conformations, such as the helical conformations 
(HH), as well as low entropy conformations such as the 
curl like trap (TR) are also recognized as communities. 
Again, 7 is varied between O.lp < 7 < p and T = 



with 50 repetitions at each value of 7 and q — 50. For 
this, we used the reduced version of the folding network 
as in 8] that contains only nodes which are visited 20 
times or more in the course of the simulation, resulting 
in 1287 nodes and 23948 links. Figure 01 shows the re- 
sulting 1287 x 1287 nodes co-appearance matrix. The 
rows and columns are ordered with respect to one single 
simulated annealing run at 7 = p. Thus, we see how well 
the ground state is approximated by the local minima 
and how robust the assignment into communities is with 
respect to 7. Again we find a clear characterization of 
the FS and TR communities. The helical conformations 
(HH), however, do not occur in one community for all 
values of 7 which indicates many different possible as- 
signments into communities and is an indication of their 
high entropy nature. Furthermore, a number of putative 
transition states (pTS) could be assigned, that mediate 
the folding from certain denatured configurations into the 
folded state. 

In conclusion, we discuss a new algorithm for commu- 
nity detection in complex networks based on a modified 
q-state Potts model. Communities appear as domains 
of equal spin value near the ground state of the system, 
which is approximated through Monte-Carlo optimiza- 
tion. Only local information is used to update the spins 
which makes parallelization of the algorithm straightfor- 
ward and allows the application to very large networks. 
On both, computer-generated and real world networks 
as studied here the algorithm performs fast, often con- 
siderably faster than current state-of-the-art algorithms. 
Without using prior knowledge it automatically detects 
the number of communities as the number of occupied 
spin states. As the algorithm is non-deterministic and 
non-hierarchical, it allows for the quantification of both, 
the stability of the communities, as well as the affiliation 
of a node to more than one community ( "fuzzy commu- 
nities" ). 
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cussions and A. Caflisch, M. Newman, and F. Rao for 
the supply of network data and helpful comments. 



[1] L. Kaufman and P. Rousseeuw, Finding Groups in Data: 
an introduction to cluster analysis (Wiley-Interscience, 
1990). 

[2] M. E. J. Newman, Eur. Phys. J. B 38 (2004). 
[3] Y. Fu and P. W. Anderson, J. Phys. A: Math. Gen. 19, 
1605 (1986). 

[4] M. Blatt, S. Wiseman, and E. Domany, Phys. Rev. Lett. 
76 (1996). 

[5] S. Kirkpatrick, C. G. Jr., and M. Vecchi, Science 220, 671 
(1983). 

[6] M. Newman, Phys. Rev. E. 69, 066133 (2004). 

[7] M. Newman and M. Girvan, Proc. Natl. Acad. Sci. 99, 

7821 (2003). 
[8] F. Rao and A. Caflisch, J. Mol. Bio. (2004). 



